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BACKGROUND OF THE INVENTION 

1. Field of the Invention : 

This invention relates to a system and method for 
supporting operations for input of user commands to household 
15 electric appliances such as a television set/monitor and 
information equipment, and in particular, to an interactive 
operation support system and a method therefor, which permit 
input of user commands to various kinds of connected equipment 
interactively . 

20 More specifically, the present invention is concerned 

with an interactive operation support system and a method 
therefor, which are adapted for input of user commands to the 
equipment in a natural form through a personified assistant, 
and in particular, to an interactive operation support system 

25 and a method therefor, which permit input of user commands 
by means of interaction with a personified assistant on a speech 
input base. 

2 . Description of the related art 

30 Conventionally, an operation panel, a keyboard and a 

mouse or the like of a type requiring manual operation are 
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mainly applied to a user interface, that is, input of commands 
to various kinds of household electric equipment such as a 
television set/monitor or information equipment including a 
personal computer. With the improvement of operation input 
5 performance of a processor and the advance of cognitive 
technology including speech recognition, an interactive 
speech-based input user interface is now widespread as well. 

Since the user interface based on the manual operation 
permits direct and uniform input of commands to the equipment , 
10 a user may put the input operations into practice with certainty . 
However, the user has to understand and further get skilled 
in the techniques for operating the equipment to a certain 
extent, and therefore, an excessive burden is required for 
the user. 

15 For instance, "a fingertip operation based interface" 

for control a menu with a ten-key or the like is mainly used 
in a television and other AV equipment. However, it is to 
be supposed that complicated operations are required for the 
interface in the user input mode described in the above to 

20 deal with network- connected household electric equipment. 

While a user interface using a commander is now being 
generalized, too numerous switches are required to meet demands 
for mult i- channel and multi- control including ground wave band, 
satellite systems, the Internet and HAVI (Home Audio/ Video 

25 Interoperability: common command system for digital AV 
equipment), resulting in an increasing number of switches, 
thus making operation increasingly complicated. Combination 
of the above user interface with a multi -function switch and 
a menu screen permits a reduction in number of switches up 

30 to a certain point, however operation becomes very complex. 

On the other hand, the user interface on the speech input 
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base makes it possible to specify a command by analyzing a 
user request based on the result of input speech recognition 
on the equipment side, resulting in a relief from user' s burden 
on the occasion of operation of the equipment. However, since 
5 it is necessary for the user to speak at a microphone in the 
absence of a partner, such operation hardly can be considered 
a naturally human action. Besides, the user may be subject 
to suffering mental anguish when carrying out interaction with 
such a kind of user interface. 

10 In this connection, there is recently provided an 

interactive operation support system, which is set to allow 
a personified assistant to appear on a display screen, 
permitting the user to perform input of commands to the 
equipment in the form of carrying out a conversation face to 

15 face with an assistant on the screen. 

In Japanese Patent Laid-open No. 11-65814, for instance, 
there is disclosed a user interface, which provides a sharp 
feeling of presence and actuality for the user by detecting 
a sound produced from the user or a direction of a sound source 

20 to control a visual image of an imaginary creature according 
to the result of detection (i.e., an imaginary creature is 
set to follow the source of sound by constantly gazing at the 
source direction) . 

Also, in Japanese Patent Laid-open No. 11-37766, there 

25 is disclosed an agent device, which provides a personified 
agent having functions of establishing communication with a 
driver in a vehicle. According to such agent device, the 
personified agent is set to make motions fit for the current 
conditions of the vehicle and the driver according to not only 

30 the current conditions of the vehicle and the driver but also 
the learning effects based on the past history, permitting 
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the vehicle to establish communication with the driver. 

Recently, an Increase in computer processing power or 
the like permits a high-level interactive processing and also 
makes it possible to provide a sort of intelligence to the 
5 assistant on the screen. For instance, the assistant may not 
only operate an input command simple enough to be formed by 
a single word, such as a channel select command and a 
recording/reproduction start or stop command, but also perform 
complicated operations across a plurality of stages in pursuit 

10 of a context of the content of conversation with the user. 

However, such a system making it possible to present 
the status conditions in progress related to such operations 
on the screen through an assistant has not been developed so 
far, and as a result, there is no other way other than the 

15 user having to wait for response from the system with one's 
eyes fixed on the screen. It is to be even supposed that if 
the user gives a command to the system to execute a processing 
requiring a response time, the user would even misunderstand 
that the equipment is out of order. 

20 Thus , it is preferable that , in order to allow the user 

to operate the equipment based on interaction with the assistant , 
an operationally easy input of a command system produces an 
effect close to natural language is provided. 

25 SUMMARY OF THE INVENTION 

The preferred embodiments of the present invention 
provide a system and/or a method for supporting operations 
for inputting user commands to household electric equipment 
such as a television set/monitor and information equipment. 

30 A preferred embodiment of the present invention provides 

a system and/or method for supporting interactive operations. 
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permitting interactive input of user commands to the equipment . 

Another preferred embodiment of the present invention 
provides a system and/or a method for supporting interactive 
operations , permitting input of user commands to the equipment 
5 in a natural form through a personified assistant. 

A further preferred embodiment of the present invention 
provides a system and/or method for supporting interactive 
operations , permitting input of user commands by means of speech 
based interaction with a personified assistant. 

10 A still further embodiment of the present invention 

provides a system and/or method for supporting interactive 
operations, permitting feedback of the progress conditions 
of operations specified by user commands inputted by means 
of speech based interaction with a personified assistant. 

15 The preferred embodiments of the present invention are 

provided such that a first preferred embodiment of the present 
invention relates to a system for supporting interactive 
operations for input of user commands to electrical appliances 
or equipment. The system for supporting interactive 

20 operations includes a display unit, a speech input unit, a 
speech output unit and an operation control unit , the operation 
control unit including an assistant control means for 
generating a personified assistant to allow the generated 
assistant to appear on a screen of the display unit, an output 

25 speech control means for determining a speech required for 
the assistant to output the assistant's speech to the outside 
through the speech output unit after speech synthesis , an input 
speech recognition means for recognizing a user voice inputted 
through the speech input unit as a speech, an interaction 

30 management means for managing interaction between the user 
and the assistant according to the assistant's speech 
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determined by the output speech control means and the user 
speech recognized by the input speech recognition means , and 
a command interpreting means for specifying the user ' s 
intention or the inputted user command based on the content 
5 of interaction traced by the interaction management means. 

According to such preferred embodiment of the present 
invention, the assistant control means may also be set to 
determine a proper animation of the assistant based on the 
content of interaction managed by the interaction management 

10 means and/or the inputted user command specified by the command 
interpreting means. 

The output speech control means may also be set to 
determine the assistant's speech based on the content of 
interaction managed by the interaction management means and/or 

15 the inputted user command specif led by the command interpreting 
means . 

The output speech control means may also be set to 
determine the assistant ' s speech suitable for leading the user' 
s intention, when the command interpreting means fails to 

20 specify the user's intention or the inputted user command. 

The system for supporting interactive operations may 
further comprise a connection means for connecting the external 
equipment such as a television set /monitor and a video deck 
to the system. In this case, the command interpreting means 

25 may also be set to interpret commands for control of external 
equipment functions inclusive of broadcasting program channel 
selection and recording/reproduction in the video deck or the 
like . 

The system for supporting interactive operations may 
30 further comprise a communication means for connecting the 
system to a communication medium such as an external network 
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and a telephone line. In this case, the input speech 
recognition means may also be set to recognize audio data 
received via the communication medium. 

The system for supporting interactive operations may 
5 further comprise a communication means for connecting the 
system to a communication medium such as an external network 
and a telephone line, and a mail exchange means for making 
an exchange of electronic mails via the communication medium. 
In this case, the output speech control means may also be set 

10 to determine the assistant's speech based on the content of 
an incoming mail . 

The interaction management means may also be set to manage 
an input speech of one user as a message bound for the other 
user. In this case, the output speech control means may also 

15 be set to determine the words the assistant ' s speechwill speak 
based on such message. 

The assistant control means may also be set to place 
a personified assistant in a room (a character room) having 
various kinds of objects scattered around including links to 

20 information resources. For example, in response to an 
interested of the user in a recording media including a link 
to a music content placed in the room, the command interpreting 
means may also be set to interpret an inputted user command 
as a command to play back the music content. 

25 In response to that the command interpreting means 

succeeds in interpreting an inputted user command, the 
assistant control means may also be set to allow the assistant 
to appear on the screen of the display unit . 

The system for supporting interactive operations may 

30 further comprise a connection means for connecting a television 
set /monitor to the system. In this case, in response to the 
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command interpreting means succeeding in interpreting an 
inputted user command as a channel select command , the assistant 
control means may also be set to make appear an assistant 
carrying in one ' s hand a selected broadcasting program display 
window . 

Otherwise, in response to a command interpreting means 
interpreting an inputted user command as a channel change 
command, the assistant control means may also be set to display 
changeable broadcasting program display windows in the shape 
of a substantial ring around the assistant. In response to 
that a desired channel is definitely selected by shifting the 
display windows in such a way as to revolve on the substantial 
ring under a channel change command from the user , the assistant 
control means may also be set to zoom up the selected 
broadcasting program display window. 

Still, the system for supporting interactive operations 
may further comprise a connection means for connecting a 
secondary storage device for storing and reproducing a 
broadcasting program content to the system. The secondary 
storage device referred in the present invention includes a 
video deck, a hard disc, DVD-RAM (Digital Versatile Disk - 
Read Only Memory) drive, CD-R/W (Compact Disc [a trademark] 
- Read/Write) drive or the like media storage device capable 
of recording mass media contents, for instance. In this case, 
in response to the command interpreting means interpreting 
an inputted user command as a recorded program reproduction 
command, the assistant control means may also be set to make 
the assistant have a binder showing a view of recorded 
broadcasting program contents in one's hand to appear. In 
response to that, a recorded broadcasting program content 
selected to be reproduced is definitely selected, the assistant 
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control means may further be set to zoom up the selected recorded 
broadcasting program content display window. 

The system for supporting interactive operations may 
further comprise a connection means for connecting a television 
5 set /monitor to the system. In this case, in response to that 
the command interpreting means succeeds in interpreting an 
inputted user command as a channel change command , the assistant 
control means may also be set to allow the assistant with a 
list of changeable broadcasting programs arranged in a matrix 

10 shape in one's hand to appear. In response to that a desired 
channel is definitely selected, the assistant control means 
may further be set to zoom up the selected broadcasting program 
display window. Further, EPG (Electronic Programming Guide) 
distributed as a part of data broadcast may be applied to 

15 generate the list of broadcasting programs in the matrix shape . 

The system for supporting interactive operations may 
further include a connection means for connecting a television 
set /monitor to the system , a communication means for connecting 
the system to a communication medium such as an external network 

20 and a telephone line, and a mail exchange means for making 
an exchange of electronic mails via the communication medium . 
In this case, the assistant control means may also be set to 
allow an incoming mail display image to appear on the screen 
of the display unit, in response to the acceptance of a mail. 

25 The system for supporting interactive operations may 

further comprise a text or character conversion means for 
converting ideograms like Japanese Kanji or the like, relating 
to text data displayed on the screen of the display, unit into 
phonetic characters like Japanese Kana or the like, or still 

30 vice-versa. 

The system for supporting interactive operations may 
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further have a communication means for connecting the system 
to a communication medium such as an external network and a 
telephone line and a certifying means for certifying an 
information terminal connected to the system via the 
communication medium. 

The system for supporting interactive operations may 
further comprise a connection means for connecting a television 
set /monitor to the system, and an extraction means for 
extracting text information from a received broadcasting 
program content . In this case , the text information extracted 
by the extraction means may also be superimposed on the content 
of a different broadcasting program now being projected on 
the screen. 

A second preferred embodiment of the present invention 
relates to a method for supporting interactive operations, 
and this method is applied to the equipment including a display 
unit , a speech input unit and a speech output unit for supporting 
input of user commands to the equipment or other externally 
connected equipment. The method for supporting interactive 
operations includes an assistant control step for generating 
a personified assistant to allow the generated assistant to 
appear on a screen of the display unit , an output speech control 
step for determining a speech required for the assistant to 
output the assistant ' s speech to the outside through the speech 
output unit after speech synthesis , an input speech recognition 
step for recognizing a user voice inputted through the speech 
input unit as a speech, an interaction management step for 
managing interaction between the user and the assistant 
according to the assistant's speech determined by the output 
speech control step and the user speech recognized by the input 
speech recognition step, and a command interpreting step for 
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specifying the user' s intention or the inputted user command 
based on the content of interaction traced by the interaction 
man agemen t s t ep . 

According to the preferred embodiment of the present 
5 invention, the assistant control step may also be set to 
determine a proper animation of the assistant based on the 
content of interaction managed by the interaction management 
step and/or the inputted user command specified by the command 
interpreting step. 
10 The output speech control step may also be set to determine 

the assistant's speech based on the content of interaction 
managed by the interaction management step and/or the inputted 
user command specified by the command interpreting step. 

The output speech control step may also be set to determine 

15 the assistant's speech suitable for leading the user' s 
intention, when the command interpreting step fails to specify 
the user's intention or the inputted user command. 

When the equipment further includes a connection means 
for connecting the external equipment such as a television 

20 set /monitor and a video deck to the equipment, the command 
interpreting step may also be set to interpret commands for 
controlling of external equipment functions including a 
broadcasting program channel selection and/or 
recording/reproduction in the video deck or the like. 

25 When the equipment further includes a communication 

means for connecting the equipment to a communication medium 
such as an external network and a telephone line, the input 
speech recognition step may also be set to recognize audio 
data received via the communication medium. 

30 When the equipment further includes a communication 

means for connecting the equipment to a communication medium 
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such as an external network and a telephone line, and a mail 
exchange means for making an exchange of electronic mails via 
the communication medium, the output speech control step may 
also be set to determine the assistant's speech based on the 
5 content of an incoming mail. 

The interaction management step may also be set to manage 
an input speech of one user as a message bound for the other 
user, and the output speech control step may also be set to 
determine the assistant's speech based on the message. 

10 The assistant control step may also be set to place a 

personified assistant in a room scattered with various kinds 
of objects including links to information resources. In 
response to the interest of the user in a recording media 
including a link to a music content placed in the room, for 

15 instance, the command interpreting step may also be set to 
interpret an inputted user command as a command to play back 
the music content . 

In response to the command interpreting step succeeding 
in interpreting an inputteduser command, the assistant control 

20 step may also be set to allow the assistant to appear on the 
screen of the display unit . 

When the equipment further includes a connection means 
for connecting a television set/monitor to the equipment, in 
response to the command interpreting step succeeding in 

25 interpreting an inputted user command as a channel select 
command, the assistant control step may also be set to allow 
the assistant with the selected broadcasting program display 
window in one's hand to appear. 

Otherwise, in response to the command interpreting step 

30 succeeding in interpreting an inputted user command as a channel 
change command, the assistant control step may also be set 
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to display changeable broadcasting program display windows 
in a ring-shaped form around the assistant. In response to 
a desired channel selected by shifting the display window in 
such a way as to move on the ring-shaped form under a channel 
5 change command from the user, the assistant control step may 
further be set to zoom up the selected broadcasting program 
display window. 

When the equipment further includes a connecting means 
for connecting a secondary storage device for storing and 

10 reproducing a broadcasting program content to the equipment, 
in response to that the command interpreting step succeeds 
in interpreting an inputted user command as a recorded program 
reproduction command, the assistant control step may also be 
set to allow the assistant with a binder showing a view of 

15 recorded broadcasting program contents in one ' s hand to appear . 
In response to that a recorded broadcasting program content 
desired to reproduce is definitely selected, the assistant 
control step may further be set to zoom up the selected recorded 
broadcasting program content display window. The secondary 

20 storage device referred to the present invention includes a 
hard disc, DVD-RAM drive , CD-R/W drive or the like media storage 
device capable of recording of mass media contents , in addition 
to the video deck. 

When the equipment further includes a connection means 

25 for connecting a television set/monitor to the equipment, in 
response to that the command interpreting step succeeds in 
interpreting an inputted user command as a channel change 
command, the assistant control step may also be set to allow 
the assistant with a list of changeable broadcasting programs 

30 arranged in a matrix shape in one ' s hand to appear . In response 
to that a desired channel is definitely selected, the assistant 
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control step may further be set to zoom up the selected 
broadcasting program display window. Further, EPG 
(Electronic Programming Guide) distributed as a part of data 
broadcast may be applied to generate the list of broadcasting 
5 programs in the matrix shape. 

When the equipment further includes a connection means 
for connecting a television set /monitor to the equipment, a 
communication means for connecting the equipment to a 
communication medium such as an external network and a 

10 telephone line , and a mail exchange means for making an exchange 
of electronic mails via the communication medium , the assistant 
control step may also be set to allow an incoming mail display 
image to appear on the screen of the display unit in response 
to the acceptance of a mail. 

15 The method for supporting interactive operations may 

further include a text conversion step for converting, for 
example, a Japanese ideogram Kanji related to text data 
displayed on the screen of the display unit into a phonetic 
symbol, like Japanese Kana. This can be applied for conversion 

20 of displayed data of one system or group of characters or codes 
into another system or group of characters or codes. 

The method for supporting interactive operations may 
further include a communication step for connecting the 
equipment to a communication medium such as an external network 

25 and a telephone line, and a certifying step for certifying 
an information terminal connected to the equipment via the 
communication medium. 

When the equipment further includes a connection means 
for connecting a television set /monitor to the equipment, the 

30 method for supporting interactive operations may further 
comprise an extraction step for extracting text information 
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from a received broadcasting program content . The text 
information extracted by the extraction step may also be 
superimposed on the content of a different broadcasting program 
currently projected on the screen. 
5 A third preferred embodiment of the present invention 

relates to a storage medium, in which computer software 
describing the interactive operation support processing for 
execution on a computer system is stored physically in a 
computer readable form, the interactive operation support 

10 processing being applied to equipment including a display unit , 
a speech input unit and a speech output unit for supporting 
input of user commands to the equipment or other externally 
connected equipment . The computer software of the storage 
medium includes an assistant control step for generating a 

15 personified assistant for making the generated assistant to 
appear on a screen of the display unit , an output speech control 
step for determining a speech required for the assistant to 
output the assistant ' s speech to the outside through the speech 
output unit after speech synthesis, an input speech 

20 recognition step for recognizing a user voice inputted through 
the speech input unit as a speech, an interaction management 
step for managing interaction between the user and the assistant 
according to the assistant's speech determined by the output 
speech control step and the user speech recognized by the input 

25 speech recognition step, and a command interpreting step for 
specifying the user's intention or the inputted user command 
based on a content of interaction traced by the interaction 
management step. 

The storage medium according to the third preferred 

30 embodiment of the present invention refers to a medium, which 
provides physically the computer software, in a computer 



15 



readable form, for a general-purpose computer system 
permitting various program codes to be executed, for instance . 
The above storage medium includes CD (Compact Disc - a 
trademark) , FD (Floppy Disc - a trademark) , MO (Magneto-optical 
5 Disc) or any other detachabeand portable storage medium, for 
instance. The above storage medium may also technically 
provide the computer software for a specific computer system 
in a computer readable form via transmission medium such as 
a network (no matter whether such network is of a cable or 

10 a wireless type) or the like. 

The above storage medium is constructed in accordance 
with definition of the structural or functional cooperative 
relation between the computer software and the storage medium 
for the purpose of performing the predetermined computer 

15 software functions on the computer system. In other words, 
installation of the predetermined computer software into the 
computer system through the storage medium according to the 
third preferred embodiment of the present invention makes it 
possible to perform cooperative functions on the computer 

20 system and as a result, may produce the functional effects 
similar to those of the system or method for supporting 
interactive operations according to the first or second 
preferred embodiment of the present invention. 

According to the system and method for supporting 

25 interactive operations of the present invention, applying the 
animated character called a personified assistant making 
reactions based on speech analysis and animations to the user 
interface permits the user to feel friendly toward the user 
interface and simultaneously makes it possible to meet a demand 

30 for complicated commands or to provide an entry into services 
for the user . Further , since there is provided a command system 
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producing an effect close to natural language, the user may 
easily operate the equipment with the same feeling as ordinary 
human conversation. 



5 BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects and features of the 
invention will become apparent from the following description 
of preferred embodiments of the invention with reference to 
the accompanying drawings, in which: 
10 Fig. 1 schematically illustrates the hardware 

configuration of a system 1 for supporting interactive 
operations for use in a preferred embodiment of the present 
invention; 

Fig. 2 shows a command processing system in the 
15 interactive operation support system 1 according to a preferred 
embodiment of the present invention; 

Fig . 3 shows a character control system in the interactive 
operation support system 1 according to a preferred embodiment 
of the present invention; 
20 Fig. 4 is a block diagram showing the principle 

configuration required for the command processing on a speech 
input base in the interactive operation support system 1 
according to a preferred embodiment of the present invention; 

Fig. 5 is a flow chart schematically showing the flow 
25 of the character control processing according to a preferred 
embodiment of the present invention; 

Fig. 6 is a view showing a display screen, which appears 
immediately after the application of power to a television 
monitor 25 according to a preferred embodiment of the present 
30 invention; 

Fig. 7 is a view showing the state, in which a command 
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is given to an assistant by means of input of a speech in a 
natural language form according to a preferred embodiment of 
the present invention; 

Fig. 8 is a view showing display on the screen, when 
5 direct channel select operation is performed through an 
assistant according to a preferred embodiment of the present 
invention; 

Fig. 9 is a view showing display on the screen, when 
direct channel select operation is performed through the 
10 assistant according to a preferred embodiment of the present 
invention; 

Fig. 10 is a view showing display on the screen, when 
direct channel select operation is performed through the 
assistant according to a preferred embodiment of the present 
15 invention; 

Fig. 11 is a view showing display on the screen, when 
direct channel select operation is performed through the 
assistant according to a preferred embodiment of the present 
invention; 

20 Fig. 12 is a flow chart showing the procedure of 

implementing a user interface based on a direct command form 
according to a preferred embodiment of the present invention; 

Fig. 13 is a view showing a multi-view screen, which 
permits the user to watch the whole programs currently on air 

25 on the respective channels in one view according to a preferred 
embodiment of the present invention; 

Fig . 1 4 is a view showing the state , in which each channel 
display panel on the multi-view screen is shifted in such a 
way as to revolve on a ring under a channel change command 

30 from the user according to a preferred embodiment of the present 
invention; 
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Fig. 15 is a virtual view showing the multi-view screen 
when viewed from the above according to a preferred embodiment 
of the present invention; 

Fig. 16 is a view showing the state, in which a channel 
5 C is highlighted after being temporarily selected according 
to a preferred embodiment of the present invention; 

Fig. 17 is a view showing the state, in which after 
transition of the channel C from the temporarily selected state 
to the definitely selected state, a selected program display 
10 panel is gradually zoomed up according to a preferred embodiment 
of the present invention; 

Figs. 18A and 18B are views showing the state, in which 
a program recording command is given to the assistant by means 
of input of a speech according to a preferred embodiment of 
15 the present invention; 

Fig. 19 is a view showing display on the screen, when 
reproduction of a recorded program content is performed through 
the assistant according to a preferred embodiment of the present 
invention; 

20 Fig. 20 is a view showing display on the screen, when 

reproduction of a recorded program content is performed through 
the assistant according to a preferred embodiment of the present 
invention; 

Fig. 21 is a view showing display on the screen, when 
25 reproduction of a recorded program content is performed through 
the assistant according to a preferred embodiment of the present 
invention; 

Fig. 22 is a view showing display on the screen, when 
reproduction of a recorded program content is performed through 
30 the assistant according to a preferred embodiment of the present 
invention ; 
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Fig. 23 is a view showing display on the screen, when 
reserved recording is set through the assistant according to 
a preferred embodiment of the present invention; 

Fig. 24 is a view showing display on the screen, when 
5 reserved recording is set through the assistant according to 
a preferred embodiment of the present invention; 

Fig. 25 is a view showing display on the screen, when 
reproduction of a recorded program content is specified on 
a daily basis through the assistant according to a preferred 
10 embodiment of the present invention; 

Fig. 26 is a view showing display on the screen, when 
reproduction of a recorded program content is specified on 
a daily basis through the assistant according to a preferred 
embodiment of the present invention; 
15 Fig. 27 is a view showing interaction carried on through 

the medium of the assistant when a mail is accepted; 

Fig. 28 is a view showing interaction carried on through 
the medium of the assistant when a mail is accepted according 
to a preferred embodiment of the present invention; 
20 Fig. 29 is a view showing interaction carried on through 

the medium of the assistant when a mail is accepted according 
to a preferred embodiment of the present invention; 

Fig . 30 is a flowchart showing the procedure of displaying 
an incoming mail image on the monitor screen according to a 
25 preferred embodiment of the present invention; 

Fig . 31 is a view showing the state of message and bulletin 
board functions performed through the medium of the assistant 
according to a preferred embodiment of the present invention; 

Fig. 32 is a view showing a mechanism of the interactive 
30 operation support system 1 according to a preferred embodiment 
of the present invention for accepting an inputted user command 
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from an information terminal in a remote place; 

Fig. 33 is a view showing the state, in which remote 
control from the user is accepted to the interactive operation 
support system 1 according to a preferred embodiment of the 
5 present invention through a personified assistant; 

Fig. 34 is a view showing the state, in which remote 
control from the user is accepted to the interactive operation 
support system 1 according to a preferred embodiment of the 
present invention through the personified assistant; 
10 Fig. 35 is a view showing the state, in which remote 

control from the user is accepted to the interactive operation 
support system 1 according to a preferred embodiment of the 
present invention through the personified assistant; 

Fig. 36 is a flowchart showing the procedure of informing 
15 the user about text information of the content of a program 
on a different channel according to a preferred embodiment 
of the present invention; 

Fig. 37 is a view showing the location of score display 
areas displayed on the content of a program now being projected 
20 on the screen according to a preferred embodiment of the present 
invention; 

Fig. 38 is a flow chart showing the procedure of setting 
time information based on time information displayed on the 
content of a program now being projected on the screen according 
25 to a preferred embodiment of the present invention; and 

Fig. 39 is a view showing the location of time display 
areas on the content of a program now being projected on the 
screen according to a preferred embodiment of the present 
invention. 

30 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 



Fig. 1 schematically illustrates a hardware 
configuration of a system 1 for supporting interactive 
operations for use in a preferred embodiment of the present 
5 invention. The system 1 is configured as a receiving device 
for a television set /monitor such as STB (Set Top Box), for 
instance, and is connected to the television set /monitor. The 
system 1 may provide support services of user operations such 
as channel selection and recording/reproduction by carrying 

10 on interaction with a user through the mediation of an assistant , 
which will be described later, to interpret explicit or latent 
user' s intention based on the interaction . A description will 
now be given of each unit with reference to Fig. 1. 

A central control unit 11 refers to an operation control 

15 unit for controlling the operations in the interactive 
operation support system 1 generally according to a 
predetermined control program and performs the processing of 
generating an assistant as a partner for the user, allowing 
action patterns of the assistant to appear and managing the 

20 interaction between the user and the assistant based on 
input/output of speeches and images, for instance. 

The following functions are contained in the control 
program executed in the central control unit 11, for instance. 
That is: 

25 (1) Operational control of each unit in the system 1 
according to an inputted user command resulting from 
recognition of a speech inputted through a microphone 22; 
( 2 ) Control of various kinds of external equipment connected 
to the system through an input/output interface 17; 

30 (3) Control of a tuner 15; 

( 4 ) Character control relating to the assistant ( generation 
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of the animation corresponding to the input command resulting 
from speech recognition) ; 

( 5 ) Speech synthesis ( conversion of audio data produced from 
the character into an audio signal. Provided that 

5 synchronization of the animation of character lip motions or 
the like with a speech is required.); 

(6) Control of connection or the like of the system to an 
external network through a communication interface 18; 

(7) Control of EPG (Electronic Programming Guide) and other 
10 data for data broadcast; 

(8) Control of output of a speech through a speaker 21; 

(9) Control of display on a screen through a monitor 25; 

(10) Control according to the inputted command through a 
remote controller (not shown); 

15 (11) Processing of text data for use in electronic mails, 
EPG and a wide area network; 

(12) Conversion of text data based on user profiles (for 
converting Kanji into Kana for children, for example); 

(13) Image measurement based on data accompanying a video 
20 signal (for extracting information relating to the progress 

of scoring on the sports programs such as a baseball game and 
a soccer game and time information from a program content 
displayed on the screen) and various kinds of services based 
on image recognition (for informing the user about the 
25 information related to the progress of scoring and also to 
set time information or the like); 

(14) Bit map conversion requiring a font selected from a font 
database based on text data; 

( 15) Combination of texture selected from a texture database 
30 with a font bit map; and 

(16) Basic settings of the system (including screen 
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brightness, sound volume and various kinds of input/output 
operations) . 

A tuner 15 performs tuning of broadcast wave of a 
predetermined channel, that is, channel selection according 
5 to a command from the central control unit 11. The received 
broadcast wave is separated into a video data portion and an 
audio data portion .. The video data is outputted to the monitor 
25 through an image processing unit 16 for display on the screen , 
while the audio data is outputted to the speaker 21 through 
10 a speech synthesis unit 14 for production of sounds 
(alternatively, line output will be enough as well) . 

A storage device 12 is used for storage of data required 
for generation of images and action patterns of the assistant. 
The following information is included in the data stored, for 
15 instance. That is: 

( 1 ) 3D-character image information of the assistant and data 
required for generating the animation of the assistant; 

(2) Layout and other information relating to a character 
room adapted to bring the assistant into action; 

20 (3) User profile information of the user who carries on the 
interaction with the assistant; 

(4) History of user-assistant conversations in the past and 

other interchanges, and character /feeling and learning data 

based on the history; and 
25 (5) Advertising contents to be mapped into the assistant 

or the character room. 

The storage device 12 also performs storage of various 

kinds of databases (not shown) such as a font database and 

a texture database, in addition to storage of information 
30 relating to the assistant. The font database is used for 

management of various kinds of fonts required for EPG , an 
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electronic message board and an electronic mail or the like. 
The texture database is used for management of various kinds 
of textures (2D-bit map data or the like) required for the 
EPG, the electronic bulletin board and the electronic mail 
5 or the like. 

While it doesn't matter that the system 1 is set to make 
only a single assistant to appear, a different assistant may 
also be provided for each user. That is, a plurality of 
characters different in age, sex and character are available 

10 for the system 1 to automatically select the characters 
according to the user profile on the occasion of initial log- in 
or to permit the user to select the characters for making an 
entry of the selected character in association with the user 
profile or the like. Otherwise, assistant learning/history 

15 data may also be provided for each user, so that the same 
assistant, even if needed, is set to make different reactions 
according to each user. 

The speech recognition unit 13 performs recognition of 
an audio signal, i.e. , a user speech supplied through a speech 

20 input device such as the microphone 22 as text information, 
before analysis of an inputted user command converted into 
a text format with the use of a language database (not shown) . 
More specifically, the processing of dividing a text into word 
units through morpheme analysis to gain language information 

25 such as syntactic information and conversational information 
through syntactic/semantic analysis is performed to 
understand the inputted user command or the like, which is 
then outputted to the central control unit 11. 

The input/output interface 17 refers to a device for 

30 connecting the external equipment such as a video deck 23 and 
a personal computer (PC) 24 to the system 1. One or more AV 
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equipment and information equipment may be connected 
externally to the system 1 according to interface specification 
such as i-link (IEEE1394), for instance. 

The communication interface 18 refers to a device for 
5 mutually connecting the system 1 to other host computer systems 
on an external network. The external network is equivalent 
to a wide area network such as the Internet, for instance. 
On the network, there are provided various kinds of servers 
such as a WWW (Worldwide Web) server to distribute WWW resources 

10 described in HTML format, a mail server to distribute mail 
exchange services to each user account and an advertising server 
to distribute advertising contents updated every moment. In 
the embodiment of the present invention, it is to be understood 
that at least one of the servers on the network should be a 

15 character server to distribute character data of images, 
animations and character/action models relating to the 
assistant required for support of interactive operations for 
free or paid service. 

In addition to the above servers, the network also 

20 involves an information distribution server such as "Season 
Database" constructed by collecting public institution 
services or the like, "Weekly" to distribute a weather report , 
a broadcasting program guide or the like every week, "Daily" 
to distribute news and advertisement or the like highly 

25 instantaneous information every day and "Timely" to distribute 
constantly changing information like stock quotations, 
exchange rate and traffic information , a commerce server to 
distribute services of physical distribution sales and 
settlement of accounts (electronic settlement of accounts) 

30 and an Internet service provider or the like. 

In case of a TCP/IP (Transmission Control 
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Protocol/Internet Protocol) network, for instance, since the 
resources distributed from each server are identified in URL 
(Uniform Resource Locator) format, the system 1 may download 
these information resources according to a predetermined 
5 protocol such as HTTP (Hyper Text Transfer Protocol) . Thus, 
the interactive operation support system 1 according to the 
embodiment of the present invention may update the images and 
character/functions or the like of the assistant by 
re -downloading the active character data cyclically or at the 
10 desired timing. 

Transactions executable by the interactive operation 
support system 1 via the network are given as follows: 
( 1 ) To update a control program for driving each unit in 
the system 1; 

15 (2) To download a character constituting the assistant; 

(3) To download font data; 

(4) To download texture data; 

(5) To issue a request to substitute program recording for 
a recording means (video tape or like media, for instance) 

20 which is not provided although desired (Refer to "Recording 
substituting system" disclosed in Japanese Patent Laid-open 
No. 2000-162320 already assigned to the present applicants); 

(6) To analyze a user profile and to user-customize; 

(7) To utilize public institution services; 

25 (8) To acquire a weather report, a program guide, news, 
traffic information and advertisement or the like; 

(9) Electronic commercial transactions; 

(10) Character control via the network (with the use of a 
speech, an electronic mail and a control Web page or the like) . 

30 A modem 19 refers to a device for transferring digital 

computer data via a public telephone line such as PSTN (Public 
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Switched Telephone Network) and performs modulation into an 
analog signal and demodulation into a digital signal. 

A telephone installed in each house is connected to the 
public telephone line through a switchboard, for instance. 
5 The public telephone line is also connected to a wireless 
telephone network. Thus, the interactive operation support 
system 1 according to the preferred embodiment of the present 
invention permits exchange of data to and from the installed 
telephone and a mobile telephone. The assistant automatically 

J= J.O generated in the central control unit 11 may also be set to 
interpret inputted user commands based on the interaction with 

M the user through the mobile telephone. 

£ The central control unit 11 gives effect to perform the 

:; interaction between the user and the system 1 by controlling 
15 the operations in the system 1 in accordance with the result 
.1 of speech recognition by the speech recognition unit 13. 
* When the input user speech is interpreted as a 

j conversation with the assistant, for instance, the central 
control unit 11 determines assistant reactions based on a speech 
20 and animations after determining the motions of the assistant 
according to learning/history data and the action models 
relating to the assistant. 

The assistant's speech is outputted to the outside 
through the speaker 41 after being synthesized by the speech 
25 synthesis unit 14. When a sound of a program now being on 
the air is produced, the assistant's speech may also be 
outputted after being superposed with the sound. 

The assistant motion is synthesized with an image by 
an image processing unit 16 with reference to 3D-character 
30 information and animation information. At this event, a 
background (scene) can be changed over with reference to 
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character room information , when occasion demands . Otherwise , 
an assistant image or a character room, where the assistant 
is found, may be displayed in the state of superposition with 
the contents of one or more programs now being on the air (which 
5 will be described later in detail) . 

When the input user speech is interpreted as a channel 
change command, the central control unit 11 transfers a channel 
number to the tuner 15 for channel selection. 

When the input user speech is interpreted as a command 

10 (start and end of recording and reproduction operations, fast 
forward, rewind and playback from each head and file transfer, 
for instance) to the external equipment such as the video deck 
23 and the personal computer 24, the central control unit 11 
issues a required command to the associated equipment via the 

15 input /output interface 17. 

When the user input speech is interpreted as a command 
for access to the wide area network, the central control unit 
11 transfers a request for access to a specified host system 
on the network via the communication interface 18. When the 

20 network refers to TCP/IP network such as the Internet, for 
instance, the request for access is described in URL format. 
In such a case, the user may read up URL or may utter few words 
(a home page title, for instance) uniquely related to URL. 
In the latter case, the speech inputted through the assistant 

25 is converted into URL after being recognized. 

The system 1 may also be set to accept an inputted user 
command from a remote controller (not shown) , similarly to 
the conventional AV equipment. In such a case, a unit for 
receiving wireless (infrared) data transmitted from the remote 

30 controller and a decoder for interpreting data received are 
required, and decoded data and commands are processed in the 
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central control unit 11. 

The Interactive operation support system 1 according 
to the preferred embodiment of the present invention is 
characterized by performing input of commands from the user 
5 on a speech base by carrying on the interactive processing 
through the medium of the assistant generated on the screen 
of the monitor 25. The assistant referred to the embodiment 
is a 3D -character having animation functions . The interactive 
processing is set to involve the processing of interpreting 

10 a user-assistant conversation (a daily conversation) to 
extract user commands and that of providing user feedback 
through assistant reactions. 

According to the interactive operation support system 
1 of the preferred embodiment of the present invention , applying 

15 the personified assistant making reactions based on speech 
synthesis and animations to the user interface permits the 
user to feel friendly toward the user interface, and 
simultaneously makes it possible to meet a demand for 
complicated commands or to provide an entry into services for 

20 the user. Further, since the interactive operation support 
system 1 has the command system producing an effect close to 
natural human language, the user may easily operate the 
equipment with the same feeling as ordinary conversations . 
Fig. 2 shows a command processing system in the 

25 interactive operation support system 1 according to the 
preferred embodiment of the present invention. 

In the speech recognition unit 13, the user speech 
inputted through the microphone is recognized as text 
information, and further, the inputted user command converted 

30 into a text format is analyzed with the use of a language database 
(not shown) . 
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In the preferred embodiment of the present invention, 
commands provided as practicable commands include "Character 
control" for generating assistant motions in response to the 
inputted user command, "Equipment control" for instructing 
5 the video deck and other external equipment connected to the 
system 1 to operate, "Speech synthesis" for generating an 
assistant's speech in response to the inputted user command, 
"Mail" for making an exchange of mails through the network 
and "Bulletin board" for making an exchange of messages among 
10 a plurality of (unspecified) users or the like. 

"Character control" refers to a command to control the 
system 1 and the equipment externally connected thereto in 
cooperation with the assistant motions (in other words, with 
the assistant motions as feedback) in response to the inputted 
15 user command on a speech base. 

Fig. 3 illustrates a character control system in the 
interactive operation support system 1 according a the 
preferred embodiment of the present invention. 

As shown in Fig. 3, the system 1 may perform function 
20 commands such as "Channel selection", "Channel change", "AV 
equipment control" , "Mail read", "Mail write", "Bulletin board 
read", "Bulletin board write" and "Ambient" through character 
control . 

In this embodiment, "Ambient" means the functions of 
25 setting the character constituting the assistant to make 
motions in a proper way or to act as if urging the user to 
input commands by means of speech synthesis when the system 
is placed in the wait state. 

Synchronization of speech with animation for automatic 
30 lip sync (pronouncing lip shape) is preferably required for 
the system 1 to perform "Mail read" and "Bulletin read" or 
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the like functions of setting the character to read up text 
information. 

The procedure of performing the function commands 
through character control will be described below. 
5 The character is set to inform the user about the current 

status by means of speech synthesis or the like. The system 
1 , when remote-controlled by the user via the network or through 
the mobile telephone, may apply an electronic mail or the like 
means to inform the user about the status. 

10 Instead of a single character constituting the assistant , 

characters customized individually every user may also be 
provided for the same interactive operation support system 
1 . A type or model and animation of each character can be 
updated or changed through a communication means such as a 

15 network, a recording media or broadcasting , for instance. 
Further, advertising or other information contents may be 
dynamically mapped into clothes texture of each character. 

Fig. 4 illustrates a basic configuration required for 
the command processing on a speech base in the interactive 

20 operation support system 1 according to a preferred embodiment 
of the present invention. 

The speech inputted through the microphone 22 is 
converted into text base information after being recognized 
by the speech recognition unit 13. 

25 The central control unit 11 performs the processing of 

interaction with the user based on the text information to 
understand the user command given in the form close to a natural 
language . 

The central control unit 11 then controls the operation 
30 of the AV equipment externally connected to the interactive 
operation support system 1 according to the result of command 
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interpreting . In addition , the central control unit 11 permits 
user feedback of the conditions of execution of the commands 
by means of speech synthesis or by generating character 
animation. 

5 A description will now be given of some embodiments 

relating to the command interpretive processing carried by 
the interactive operation support system 1 on the speech input 
base according to the preferred embodiment of the present 
invention. 
10 [Embodiment 1] 

An input user speech in Japanese, "Video 1 kara Video 
2 ni dubbing shite", which is English equivalent to a command 
of " Dub a recorded content in Video 1 into Video 2 " , is processed 
as follows. 

15 — » The system 1 converts the input user speech into text 

and further divides into lexical units of "Video" , " 1 " , "kara" , 

"Video", "2","ni", "dubbing" and "shite". 

— > The system 1 further sorts this command form into two 

kinds of lexical units, that is, one representing two kinds 
20 of equipment specified by "Video", "1" and "2" and the other 

representing a single command specified by "dubbing". 

— »■ The system 1 then analyses these lexical units to 

generate an equipment control command of "Video" "1" "kara" 

"Video" "2" "ni" "dubbing", which is English equivalent to 
25 a command of "Dub a recorded content in Video 1 into Video 

2" . 

t Embodiment 2 ] 

An input user speech in Japanese, "Video wo Dubbing", 
which is English equivalent to a command of "Dub a recorded 
30 content in Video", is processed as follows. 

- ■* The system 1 divides the input user speech into lexical 
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units of "video", "wo" and "dubbing". 

-^Since the lexical unit representing a command specified 
by "dubbing" is contained in this command form, it may be 
supposed that there are two kinds of equipment. Then, the 
5 system 1 provides, for the user, a speech representing a 
question stating that from which video to which video. In 
response to the question from the system, a command of "Video 
1 kara Video 2 e", which is English equivalent to a command 
of "From Video 1 to Video 2", is given from the user to the 
10 system. 

— * The system 1 further divides the above command form 
into lexical units of "Video", "1", "kara", "Video", "2" and 
"e". 

— » After reinput of lexical units insufficient for 
15 lexical interpretation, the system 1 generates a command of 
"Video 1 kara Video 2 e dubbing" , which is English equivalent 
to a command of "Dub a recorded content in Video 1 into Video 
2" . 

[ Embodiment 3 ] 

20 An entry of synonyms is required for lexical 

interpretation to cope with a wide-ranging mode of expression. 

For instance, a lexical unit of "1 channel" covers all 
the following modes of expression in the Tokyo area. 

"Ichi" 
25 "I chan" 

"I channel" 

"Sogo" 

"Sogo TV" 

"NHK sogo" 
30 "NHK sogo TV" 

"NHK" 
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[ Embodiment 4 ] 

The command forms are classified into single, double, 
triple command forms or the like. 
5 The single command form refers to a command formed by 

a single lexical unit such as "NHK" and n Yon- channel * , for 
instance. 

The double command form is composed of lexical units 
representing the single equipment and a single command form. 

10 For instance, a command composed of lexical units of 
"Television", "rokuga" and "shite" (provided that omission 
of "shite" is possible), which are English equivalent to a 
command of "Record a television program" and a command composed 
of lexical units of "Rokuga" , "shitamonoo" and "misete" , which 

15 are English equivalent to a command of "Playback recorded 
program content", are involved in the double command form. 

The triple command form is composed of lexical units 
representing two kinds of equipment and a single command form. 
For instance, a command composed of lexical units of "Video", 

20 "1", "kara", "Video" "2", "ni" and "dubbing" , which are English 
equivalent to a command of "Dub a recorded content in Video 
1 into Video 2", and a command composed of lexical units of 
"DVD", "wo", "Video", "1", "ni" and Copy" , which are English 
equivalent to a command of "Dub a recorded content in DVD into 

25 Video 1" and so on are involved in the triple command form. 

Fig . 5 schematically shows a flow of the character control 
processing. 

When the power of the television monitor 25 is turned 
on , the character constituting the assistant begins operating. 
30 The assistant is set to sit and wait in a living room (or one's 
private room), for instance, for input of function commands 
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such as "Channel change", "Recording of dynamic image", 
"Reproduction (playback) of recorded content " , "Mail write" , 
"Mail read", "Web search (i.e. , search of WWW information space) , 
"Message (for writing or reading of a message to or from the 
5 bulletin board)" and "Service" from the user. 

The user may give a command to the assistant in a natural 
language form by means of input of a speech through the 
microphone 22, for instance. Otherwise, the system 1 also 
permits the user to give a command to the assistant through 

10 the mobile telephone from the place where the user is staying. 
Provided that there is no point in giving function commands 
such as "Channel selection" and "Reproduction of recorded 
program content" from the remote environment, and therefore, 
the functions practicable by commands through the mobile 

15 telephone may be limited to "Recording of dynamic image" , "Mail 
read" and "Service" or the like. 

Fig. 6 is a view showing a display screen, which appears 
immediately after turning the power of the television monitor 
25 on . 

20 Immediately after activation of the system, the 

assistant is put in the "Ambient", that is, the standby state 
in such a way as to make motions in a proper way or to act 
as if urging the user to input a command through speech 
synthesis . 

25 In the illustrated embodiment, an assistant character 

called "Yoshio" is making an appearance in an assistant room 
(a character room). Each assistant has a default or 
user-specified character. "Yoshio" is set to meet the 
following requirements, for instance. 

30 [Formula 1] 

"Yoshio" is 
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a cheerful boy, who lives inside the Television set and 
is growing up day after day. 

Since he inhabits the Television set, he is well 
acquainted with everything about the Television as a matter 
5 of course and also can control most of the equipment connected 
to the Television. 

Since he is a curious boy, various kinds of things are 
put in his room, and up-to-date information also lies scattered 
around in his room. 
10 He sometimes makes meaningless motions too. (Well, he 

must make allowance for his childhood. ) 

Although it seems that he has friends or companions, 
persons who succeeded in gaining access to his friends or 
companions are limited yet . 
15 Well, suppose that he is a good-natured boy, in other 

words, a pretty interface. 

The "Yoshio's" room is scattered with various kinds of 
objects such as magazines and toys, for instance. These 
objects have references of links to various services such as 
20 advertising and merchandizing services (including 
distribution of data content and music content in addition 
to physical shopping) . That is, user-accessible objects are 
scattered about the room, and it comes to this, that the 
scattered objects suggest that the user may gain access to 
25 what kind of things, if speaking something. 

The system 1 permits the user to utilize advertising 
and other information resources by providing the room for the 
assistant. The assistant, if set to derive the user profile 
through the interaction with the user, makes it possible to 
30 meet a demand for more carefully throughout services . The 
assistant may also be set to cope with user's peculiar way 
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of expression and speaking or the like. 

For instance, if the user speaks to "Yoshio" about one's 
interested object, the input user speech is recognized and 
then interpreted as a command, and information resources, to 
5 which the target object is link-referred, are called. The 
objects scattered over in the room or the assistant clothes 
may be changed everyday. 

When the system 1 is set to store a performable music 
content in advance, CDs and a CD player may be provided as 

10 the accessible objects in the "Yoshio 's" room. In this case, 
each of scattered CDs (a recording media) has a link to the 
associated music content. When a question of "What kind of 
CD is this?" is put to "Yoshio" by the user, the system 1 may 
make an answer like "This is OA." through "Yoshio" (Refer 

15 to Fig. 7) , before starting playback of the music content just 
as being performed with the use of the CD player displayed 
on the screen . 

Otherwise, in response to a question of "What kind of 
CD is this" put to "Yoshio" by the user on condition that CDs 

20 relating to a hit chart are provided as accessible objects 
displayed on the floor of the "Yoshio's" room, the system 1 
may judge the user to have an interest in musical pieces of 
the hit chart, before issuing a request for purchase of a CD 
(on-line shopping) or the like to a predetermined server or 

25 downloading the required music content from a predetermined 
site. 

In the embodiment shown in Fig. 6, a poster of an 
automobile is provided as the accessible object displayed on 
the wall surface of "Yoshio's" room. The poster provided for 
30 the room as described the above refers to an advertising medium 
or reference of link to the advertising medium. For instance. 
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the system 1 may induce the user to buy or present advertising 
information to the user by means of speaking to the user through 
the assistant. 

In the embodiment shown in Fig . 6 , a globe is also provided 
5 as the accessible object displayed in the "Yoshio's" room. 
An information distribution system developed in a global scale , 
that is, metaphor called WWW may be provided for the globe. 
In this case, speaking to "Yoshio" about the terrestrial globe 
activates a WWW browser (not shown), permitting WWW search. 
10 In the embodiment shown in Fig. 6, a television 

set /monitor is also provided as the accessible object displayed 
in the "Yoshio's" room. This television set /monitor has 
metaphor about the real television monitor 25 and may be set 
to project a broadcasting program on the last channel (that 
15 is , a last selected channel ) ( in the embodiment shown , a program 
on "Channel B" as the last selected channel is being projected) . 

Although not shown in Fig. 6, a mailbox may also be 
provided as the accessible object displayed in the "Yoshio's" 
room. The mailbox is metaphor about a tray of accepting 
20 electronic mails and may be set to provide a mail display image 
in the mailbox when accepted mails are stored. 

Incidentally, instead of direct access to the display 
screen showing the "Yoshio's" room after the application of 
power, the system 1 may be set to permit the user to gain access 
25 to "Yoshio's" room in response to an input of a user speech 
or a remote controller operation. 

According to the interactive operation support system 
1 in the preferred embodiment of the present invention, applying 
the assistant called "Yoshio" to the user interface permits 
30 the user to feel friendly toward the user interface, and 
simultaneously makes it possible to meet a demand for 
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complicated commands or to provide an entry into services for 
the user. 

A description will now be given of various kinds of 
operation support processing according to the interactive 
operation support system 1 , which applies the assistant making 
reactions based on speech synthesis and 3D- animation to the 
user interface. 
(1) Television power ON/OFF 

It is assumed that in the power OFF state of the television 
(provided that the system 1 is in operation) , the user speaks 
to the assistant in a natural language form, like "Turn on 
television , Yoshio ! " . 

The input user speech described the above is recognized 
and then interpreted as a command in the speech recognition 
unit 13 , permitting the turning the power of the television 
on . 

The "Yoshio 's" room as shown in Fig. 6, for instance, 
may be set to appear on the initial screen immediately after 
the application of power. Or, some kinds of variations of 
a way to make a "Yoshio' s" appearance on the screen may be 
provided for the user to selectively use an appropriate way 
for each user or according to the environment of the user or 
the weather conditions. The following variations of the way 
for making an assistant ' s appearance may be given , for instance . 

• "Yoshio" appears in the "Yoshio 's" room with such words 
as "Hi" and "Jaan". 

• "Yoshio" is wandering around his room in silence. 

• "Yoshio" goes into the ambient state delightfully with 
such words as "Ah" or "Well" . 

In this stage, a user speech to the effect that "I want 
to watch television" , when further provided to the assistant. 
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is recognized and then interpreted as a command in the speech 
recognition unit 13. Then, the system 1 makes an affirmative 
answer such as "OK" and "All right" in a speech output form 
through "Yoshio", and simultaneously, a program on the last 
5 channel (that is, a channel having been selected the last time 
the power was turned off) is projected in such a way as to 
be zoomed up- For instance, the system may be set to operate 
the virtual television set/monitor provided in the "Yoshio 's" 
room in such a way that after the program on the last channel 

10 ( as described the above ) is projected on the virtual television 
set /monitor, the virtual television monitor screen is 
gradually zoomed up so as to eventually occupy the whole area 
of a real monitor screen. 

On the other hand, speaking to the assistant in a natural 

15 language form, like "Turn off television, Yoshio!" will be 
enough for the user to turn off the television power. 

The input user speech as described the above is recognized 
and then interpreted as a command in the speech recognition 
unit 13, permitting disconnection of television power. 

20 For the period between the acceptance of input command 

and the power turn off , the system 1 may also provide a dramatic 
presentation for the user in such a way that "Yoshio" appears 
on the screen, then recedes toward the depth of the screen 
with his back turned with a somewhat sad look and finally 

25 disappears from the screen. 
( 2) Channel selection 

Channel selection is classified into direct channel 
selection that a specific channel is selected explicitly by 
the user and zapping channel selection that anything the user 

30 likes is selected out of programs being now on the air without 
explicit selection of a specific channel. 



41 



Direct channel selection: 

In direct channel selection, since a desired channel 
is specified on the bases of input of user speech, operationally 
easy input of a command system producing en effect close to 
5 natural language is preferably required for the user. 

According to the preferred embodiment of the present 
invention, applying the assistant called "Yoshio" to the user 
interface permits the user to feel friendly toward the user 
interface and simultaneously makes it possible to meet a demand 
10 for complicated commands or to provide an entry into services 
for the user. A more natural interface may be constructed 
by means of interaction with the personified assistant, which 
is set to ask back the user about an uncomprehensible part, 
when it is not sufficient to interpret the user's intention 
15 so far as the first input of the user speech, for instance. 

Embodiments of direct channel selection operation are 
given in the following. 

( Embodiment 1 ) 
User: "Turn on NHK, Yoshio! Channel 1." 
20 "Yoshio": ("Yoshio" appears and asks back.) 
"NHK? Channel 1?" 
User: "Yes (an affirmative answer)" 

"Yoshio": ("Yoshio" waits for the subsequent input of user 
speech. ) 

25 "Yoshio" then disappears (in response to that there 

was no input of user speech for a predetermined period of time ) . 
(Embodiment 2) 

User: "Turn on "Kyoiku", Yoshio! Channel 3." 

"Yoshio": ("Yoshio" appears and asks back.) 
30 "Kyoiku? Channel 3?" 

User: "Yes (an affirmative answer)" 
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"Yoshio": ("Yoshio" waits for the subsequent input of user 
speech. ) 

"Yoshio" then disappears (in response to that there 
was no input of user speech for a predetermined period of time) . 
5 ( Embodiment 3 ) 

User: "Turn on Nihon TV, Yoshio! Channel 4." 
"Yoshio": ("Yoshio" appears and asks back.) 

"Nihon TV? Channel 4?" 
User: "Yes (an affirmative answer)" 
10 "Yoshio" : ( Yoshio" waits for the subsequent input of user 
speech. ) 

"Yoshio" then disappears (in response to that there 
was no input of user speech for a predetermined period of time ) . 
( Embodiment 4 ) 
15 User: "Turn on TBS, Yoshio! Channel 6." 
"Yoshio": ("Yoshio" appears and asks back.) 

"TBS? Channel 6?" 
User: "Yes (an affirmative answer)" 

"Yoshio": ("Yoshio" waits for the subsequent input of user 
20 speech. ) 

"Yoshio" then disappears (in response to that there 
was no input of user speech for a predetermined period of time ) . 

( Embodiment 5 ) 
User: "Turn on Fuji TV, Yoshio! Channel 8." 
25 "Yoshio": ("Yoshio" appears and asks back.) 
"Fuji TV? Channel 8?" 
User: "Yes (an affirmative answer)" 

"Yoshio": ( Yoshio" waits for the subsequent input of user 
speech . ) 

30 "Yoshio" then disappears (in response to that there 

was no input of user speech for a predetermined period of time) . 
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( Embodiment 6 ) 
User: "Turn on TV Asahi, Yoshio! Channel 10." 
"Yoshio": ("Yoshio" appears and asks back.) 
"TV Asahi? Channel 10?" 
5 User: "Yes (an affirmative answer)" 

"Yoshio": ("Yoshio" waits for the subsequent input of user 
speech . ) 

"Yoshio" then disappears (in response to that there 
was no input of user speech for a predetermined period of time ) . 
10 (Embodiment 7) 

User: "Turn on TV Tokyo, Yoshio! Channel 12." 
"Yoshio": ("Yoshio" appears and asks back.) 

"TV Tokyo? Channel 12?" 
User: "Yes (an affirmative answer)." 
15 "Yoshio": ("Yoshio" waits for the subsequent input of user 
speech. ) 

"Yoshio" then disappears (in response to that there 
was no input of user speech for a predetermined period of time) . 

Fig. 8 to Fig. 11 illustrate display of a screen, when 
20 direct channel select operation is performed through the 
assistant . 

First, it is assumed that the user of the interactive 
operation support system 1 is watching a baseball-game relay 
program now being on the air, as a viewer of the television. 

25 (Refer to Fig. 8) . While the above program is being pro jected, 
the user may put direct channel selection into practice through 
the assistant by speaking explicitly to the assistant about 
a desired channel name, like "Turn on Channel 8, Yoshio!", 
"Turn on Fuji TV, Yoshio!" or "Fuji TV". 

30 The input user speech is recognized and then interpreted 

as a command in the speech recognition unit 13 . Then, "Yoshio" 
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as the assistant with a panel (or a window) displaying the 
program currently on the air on the specified channel in one ' s 
hand appears on the monitor screen (Refer to Fig. 9) . In this 
state, the destination channel is placed in the temporarily 
5 selected state yet. 

Then, the assistant makes a request for making sure of 
the user command by means of output of a speech, like "This 
one?" . When an affirmative answer such as "Yes" is given from 
the user side in response to the request, the destination 

10 channel is changed from the temporarily selected state to the 
definitely selected state. 

In response to that the desired channel is definitely 
selected, the screen is scrolled as if the assistant pushes 
out a panel displaying the program on the source channel (i.e. , 

15 the baseball-game relay program in this embodiment) to the 
right in the drawing, so that a panel displaying a program 
currently on the air on the destination channel (Channel 8, 
in this embodiment ) gradually comes out from behind the previous 
panel (Refer to Figs. 10 and 11.). 

20 It is to be understood that a mode of clearing away the 

display panel of the program on the source channel is not 
particularly limited to scrolling of the screen in the lateral 
direction as shown in Fig. 10. For instance, the screen may 
be scrolled as if "Yoshio" as the assistant pushes the display 

25 panel of the program on the source channel downward or upward . 

Figs. 8 to 11 illustrate the direct channel selection 
in a visually easy style by means of displaying the programs 
respectively being on the air on the source channel and the 
destination channel simultaneously on the single monitor 

30 screen. Amultiple-decodingfunctionmaybeappliedtoperform 
simultaneous display of two or more broadcasting programs. 
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Incidentally, mapping logo-type and advertising 
information into the clothes or the like of the assistant making 
an appearance on the display screen for direct channel selection 
has an effect on advertising . As amatter of course , the clothes 
5 of the assistant may be changed according to the season. 

Fig. 12 is a flow chart showing the procedure of 
implementing the user interface based on a direct command form. 
According to the above procedure, the interactive operation 
support system 1 may implement a more natural user interface 

10 by means of interaction with the personified assistant, which 
is set to ask back the user about a part, which is difficult 
to understand so far as the first input of the user speech. 
A description will now be given of the procedure with reference 
to the flow chart of Fig. 12. 

15 Firstly, the user speech inputted through the microphone 

22 is recognized in the speech recognition unit 13 to extract 
an input keyword, in Step SI. 

Subsequently, by retrieving the databases provided for 
each category, a user command corresponding to the input user 

20 speech is specified in Step S2. The databases classified by 
categories are stored in the storage device 12, for instance, 
and the contents thereof may be updated via the network. 

When it is not sufficient to specify the user command 
so far as the input of the user speech up to now, the decision 

25 for block S3 is selected as "NO" in order to asking back the 
user about insufficient information by outputting a speech 
through the mediation of the assistant, in Step S5. Then, 
the processing is returned to Step SI ir order to wait for 
the subsequent input of a user speech. 

30 On the other hand, when the user command is specified 

by the input of the user speech up to now, the system issues 
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a required command to the equipment corresponding to the user 
command (the television monitor 25 and the video deck 26 or 
the like, for instance) , in Step S4. Then, the processing 
is returned to Step SI in order to wait for the subsequent 
5 input of a user speech. 
Zapping channel selection 

When zapping channel selection is required, the user 
puts input of a command, like "Show me programs on all the 
channels, Yoshio!" or "What program is showing now, Yoshio?" 
10 to the assistant into practice without selecting a specific 
channel . 

In response to the user command on the speech base as 
described the above, the assistant, that is, "Yoshio" makes 
an appearance on the monitor screen to display a multi-view 

15 screen permitting the user to watch all the programs currently 
on the air in one view. 

Fig. 13 shows the multi-view screen, which permits the 
user to watch the programs currently on the air on the respective 
channels in one view. In the embodiment shown in Fig. 13, 

20 panels displaying the programs currently on the air on the 
respective channels are placed in the shape of a ring in the 
room of "Yoshio" as the assistant so as to enclose "Yoshio". 
Immediately after change to the multi-view screen, the leading 
panel may be set to display the program currently on the air 

25 on the last channel (that is, the last selected channel), for 
instance . 

On the multi-view screen as described in the above, the 
user can give a command by speaking closely to a natural language 
form to "Yoshio" as the assistant. 
30 The program display panels are set to be shifted in 

sequence on the ring in such away that in response to a command. 



which specifies the channel , like "Turn on Channel 12 " , "Channel 
12", "Turn on TV Tokyo" and "TV Tokyo", the display panel of 
the program on the specified channel may come to the forefront 
of the monitor screen. Further, in response to a command like 
5 "one before (or previous)" and "next one (next)", the program 
display panels may be shifted forward or backward one by one 
over the ring (Refer to Fig. 14). 

Fig. 15 shows a virtual view from the above showing the 
state of the multi-view screen. As shown in Fig. 15, the 

10 program display panels are placed at substantially uniform 
intervals along the ring, which is adapted for shift of the 
program display panels, in the "Yoshio's" room. The display 
panel of the program on the channel temporarily selected is 
placed at the forefront of the ring. 

15 As shown in Fig. 15, a part of the ring has a gap with 

no program display panels . Since the gap is shifted in sequence 
together with the program display panels along the ring 
according to the channel shift command from the user, the user 
may grasp visually the shift of the channels, while following 

20 up the shift position of the gap with one ' s eyes . 

After the lapse of apredeterminedperiodof time ( several 
seconds, for instance) with a certain program display panel 
staying at the forefront of the screen, the corresponding 
channel goes into the temporarily selected state, and the 

25 display panel of the program on the temporarily selected channel 
is highlighted . Fig . 16 shows a channel C . which is highlighted 
after being put in the temporarily selected state. 

When the display panel of the program on a desired channel 
comes to the forefront of the screen, the user may definitely 

30 select the temporarily selected channel by emphatically 
speaking to the assistant about the selected channel, like 
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"That", "That one", "Show me that" and "Show me this one". 

After the desired channel is definitely selected, the 
selected channel is tuned, and simultaneously, the display 
panel of the program on the selected channel is zoomed up 
5 gradually so as to occupy the whole monitor screen in the end. 

Fig. 17 shows the state, in which after transition of 
channel C from the temporarily selected state to the definitely 
selected state, the display panel of the program on the 
definitely selected channel is gradually zoomed up. 
10 Fig. 13 illustrates zapping channel selection in a 

visually easy style on the multi-view screen by means of 
displaying the programs currently on the air on the respective 
channels simultaneously on the single monitor screen. A 
multiple-decoding function may be applied to perform 
15 simultaneous display of two or more broadcasting programs. 

Incidentally, each program display panel appearing on 
the multi-view screen is not always necessary to display the 
program currently on the air. For instance, display of 
reproduced program contents taken out from recorded data stored 
20 in the video deck or other storage device is also applicable. 
( 3) Recording 

The interactive operation support system 1 according 
to the preferred embodiment of the present invention is 
connected to one or more video decks 23 as the external equipment 

25 and therefore, may specify the video decks as the destination 
of recording of the received broadcasting program. Also, the 
internal storage device 12 configured by a hard disc drive 
may also be specified as the destination of recording. 

The interactive operation support system 1 according 

30 to the preferred embodiment of the present invention may provide 
a more natural user interface by means of interaction on a 
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speech base through the mediation of the personified assistant . 
Thus, the user may give a desired program recording command 
by speaking to the assistant in a nearly natural language form 
without depending on conventional equipment operations such 
5 as the remote controller or the like. 

Fig . 1 8 shows a state in which a program recording command 
on a speech basis is given to the assistant. 

As shown in Fig. 18A, it is assumed that the user is 
currently watching the baseball- game relay program, for 
10 instance. At any time when making a request for recording 
the program currently on the air as described in the above, 
the user can give a command by speaking to the assistant, like 
"Record this program", or "Record this", etc. 

The inputted user speech is recognized and then 
15 interpreted as a command in the speech recognition unit 13, 
and as a result, the system 1 specifies the input command as 
a request for recording the program currently on the air. 

Then, the system 1 searches a free area in the externally 
connected video deck 2 3 or the built - in hard disc . After making 
20 sure of the destination of recording, the system 1 gives an 
affirmative answer of "Yes" or the like to the user through 
the assistant by means of speech synthesis. 

As shown in Fig. 18B, a recording icon representing that 
a program is now recording is displayed on the monitor screen, 
25 together with a counter representing a recording time. 
(4) Recording of programs scheduled to broadcast 

The interactive operation support system 1 also permits 
recording of programs scheduled to broadcast , that is , reserved 
recording, in addition to the programs currently on the air. 
30 The interactive operation support system 1 according 

to the preferred embodiment of the present invent ion may provide 
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a more natural user interface by means of interaction on a 
speech basis through the mediation of a personified assistant . 
Thus, the user may give a recording reservation command by 
speaking to the assistant in a nearly natural language form 
5 without depending on the remote controller or the like 
conventional equipment operations. 

Fig. 23 and Fig. 24 illustrate display of a screen, when 
recording reservation is set through the assistant. 

It is assumed that the user makes input of speech suitable 

10 for deriving information relating to scheduled programs, such 
as "What program is starting from now? Yoshio!", "What comes 
next? Yoshio! " , "Show me EPG (Electronic Programming Guide) , 
Yoshio! " or "What program is starting from 8:00? Yoshio!" to 
the system through the assistant. 

15 The input user speech is recognized and then interpreted 

as a command in the speech recognition unit 13 . Then, "Yoshio" 
as the assistant provides display of a list of programs 
scheduled to broadcast in the form of a matrix (Refer to Fig. 
23). The EPG distributed as data for data broadcast, for 

20 instance, may be used to generate the list of programs. 

The assistant may also be set to read up the list of 
programs after being displayed on the monitor screen as shown 
in Fig. 23. 

The user may determine easily a desired program to record 
25 from the programs displayed in the form of a list. Then, the 
user gives a command to the system by speaking to "Yoshio" 
as the assistant in the natural language form, like "Record 
a program scheduled to be on the air from 8:00 on Channel D". 

The input user speech is recognized and then interpreted 
30 as a command in the speech recognition unit 13. Then, the 
selected channel D is highlighted. "Yoshio" as the assistant 
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outputs a speech suitable for urging the user to make sure 
of the selected channel, like "Is this one?", while pointing 
at a column of channel C. 

When an affirmative answer is given from the user in 
5 response to the request to make sure of or confirm the selected 
channel, reserved recording is set. In this case, the system 
1 may be set not only to highlight the column of the specified 
program for recording but also to display a recording reserve 
icon (not shown) . 

10 ( 5) Reproduction of recorded pro gram content 

The interactive operation support system 1 according 
to the embodiment of the present invention may provide a more 
natural user interface by means of interaction on a speech 
basis through the mediation of the personified assistant. 

15 Thus, the user may give not only a desired program recording 
command, but also a recorded program reproduction command by 
speaking to the assistant in the natural language form without 
depending on the remote controller or the like conventional 
equipment operations . 

20 When a random-accessible storage device such as a hard 

disc device is applied to the destination of program recording , 
an arbitrary recorded program content may be taken out before 
start of reproduction. 

Fig. 19 to Fig. 22 illustrate display of a screen, when 

25 reproduction of a recorded program content is performed through 
the assistant. 

Firstly, it is assumed that the user of the interactive 
operation support system 1 is watching a musical program 
currently on the air as a viewer of the television (Refer to 

30 Fig. 19). While the program is being on the air, the user 
may give a reproduction command to the system by explicitly 
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speaking to the assistant about reproduction of a recorded 
program content, like "Reproduce a recorded program" or 
"Reproduce a recorded video". 

The input user speech is recognized and then interpreted 
5 as a command in the speech recognition unit 13. "Yoshio" 
as the assistant provides a display image of a binder having 
the contents (thumbnail) of the recorded programs (Refer to 
Fig. 20). 

The user may specify a recorded program content desired 
10 to reproduce with the thumbnail the recorded program contents 
displayed in the binder form as a key. Then, reading up the 
desired program content to reproduce, like "Channel G" is 
enough . 

The input user speech is recognized and then interpreted 
15 as a command in the speech recognition unit 13. Then, the 
thumbnail of a target recorded program content is highlighted, 
while being zoomed up (Refer to Fig. 21) . 

The user may make sure of the target recorded program 
content for reproduction with zoomed-up thumbnail display as 
20 a key. Then, the user may give a reproduction start command 
by speaking to the assistant in the natural language form like 
"This one" and "That one" to the system through "Yoshio". 
The input user speech is recognized and then interpreted as 
a command in the speech recognition unit 13. A speech 
25 representing an affirmative answer such as "Yes" is produced 
from the system through "Yoshio" (Refer to Fig. 22). 

Then, the thumbnail of the target recorded program 
content for reproduction is zoomed up so as to occupy the whole 
monitor screen , and reproduction of the target recorded program 
30 content is started. 

Incidentally, the system 1 may also be set to specify 
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reproduction of a recorded program content on a daily basis , 
in addition to the above recorded program selecting mode. A 
scenario in such case is given briefly in the following. 
User: "Reproduce program contents having been recorded 
5 yesterday, Yoshio." 

— » The display panels of the programs contents having 
been recorded yesterday are revolving around "Yoshio" (Refer 
to Fig. 25) . 

User: "Reproduce a program content on Channel C." 

10 "Yoshio": "Is this one?" 

User: "Yes" (Refer to Fig. 26). 

— > The specified program display panel is gradually 
zoomed up so as to occupy the whole screen, and reproduction 
of the specified program content is started (not shown) . 

15 (6) Mail 

As shown in Fig. 1, the interactive operation support 
system 1 according to the embodiment of the present invention 
is connected to the external network via the communication 
interface 18 , and therefore , permits exchange of mails by making 

20 use of the mail server on the network. 

The interactive operation support system 1 according 
to the embodiment of the present invention also permits exchange 
of mails by means of interaction on a speech base through the 
medium of the personified assistant . The acceptance of mails 

25 through support by the assistant on a speech base will be 
described in the following. 

Upon acceptance of mails while the program is being now 
on the air, the system 1 displays, on the program content, 
an envelop icon informing the user that the mail is accepted 

30 in such a way as to flutter down through the program content 
(Refer to Fig. 27) . 
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Then, the envelop icon, when reaching the lower end of 
the monitor screen, disappears, while an incoming mail icon 
representing how many mails are accepted appears on the top 
right-hand side of the screen. In the embodiment shown in 
5 Fig . 28 , there is shown a case where a piece of mail is accepted . 

The user may give a command to open a mailbox by speaking 
to the assistant in the natural language form like "Show me 
a mail , Yoshio I " , for instance . That is , the input user speech 
is recognized and then interpreted as a command in the speech 

10 recognition unit .13. Then, "Yoshio" as the assistant makes 
motions to open the mail. The assistant may also be set to 
read up the text described in the mail after interpreting and 
speech synthesis. 

Incidentally, the system 1 may also be set to convert 

15 original text data from Japanese ideograms Kanji into phonetic 
characters Kana for making it easy for children to read in 
case of displaying the mail on the monitor screen. Further, 
the system 1 makes it possible to provide the operational 
environment simpler than that searching silk printed 

20 characters of a button by means of speech control for aged 
people having difficulties for reading. 

Moreover, another preferred embodiment of the present 
invention may establish conversion of text data from characters 
or codes of a system to a set of characters or codes of another 
25 system, like conversion of codes or characters between 
different languages or different alphabet systems, for 
example . 

Further, a mail display window wall pattern may be 
customized for each user. For instance, changing over a mail 
30 wall paper pattern depending on whether an out -going source 
is user's father, mother, child or a friend permits the user 
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to grasp at sight who is an out-going source or sender. As 
a matter of course, changeover of a character font is sufficient 
for this purpose as well. 

Fig. 30 is a flow chart showing the procedure for 
5 displaying the accepted mail on the monitor screen. A 
description will now be given for the processing of displaying 
the accepted mail with reference to the flow chart shown in 
Fig. 30. 

Firstly, text data contained in the accepted mail body 
10 is extracted, in Step Sll. 

Subsequently, a ideogram Kanji contained in the text 
data is converted into corresponding phonetic character (s) 
Kana, in Step S12. In this stage, all Kanji is not always 
necessarily converted into Kana. The system 1 may also be 
15 set to judge whether conversion into Kana is required based 
on the user age and other user profiles or preferences, for 
instance . 

Subsequently, the text data converted into Kana is 
expanded into bit map information by making use of a font 
20 database, in Step S13. A plurality of kinds of font databases 
is stored in the storage device 12, for instance. A required 
font database may be selected with reference to the user 
profile . 

Then, the text expanded into the bit map information 
25 is superposed with a texture serving as a so-called wall pattern 
for synthesis of a mail display image to be projected onto 
the monitor screen, inStepS14. Aplurality of kinds of texture 
databases is stored in the storage device 12, for instance. 
A required texture may also be selected with reference to the 
30 user profile. 

(7) Message (Bulletin board) 
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A message exchange system requiring a computer such as 
BBS (Bulletin Board System) has been already generalized in 
information processing/information communication fields . 

In the conventional message bulletin board system of 
5 this type, each user needs to write a message on a character 
input base onto a bulletin board provided by a specific server, 
whereby the system permits the other user to read this message. 

On the other hand, according to a message bulletin board 
provided by the interactive operation support system 1 in the 
10 preferred embodiment of the present invention, input of 
messages and open-to-public of the messages may be performed 
by means of interaction on a speech base through the medium 
of the personified assistant. In input of a message from one 
user, instructions to the assistant that the input message 
15 is bound for a particular destination permits the assistant 
to read up the message for only the particular destination. 

Some embodiments of scenario of messages according to 
the message bulletin board performed by the interactive 
operation support system 1 in the embodiment of the present 
20 invention are given in the following (Refer to Fig. 31). 
( Embodiment ) 

Mother (Userl): "Hi, Yoshio, I will go shopping for awhile. 
I will be back at 6:00." 
(She goes out . ) 

25 Takuro (User 2): " I'm home. Mother" (Takuro (her child) goes 
back to home . ) 

Takuro: "There?" (He understands that his mother is absent.) 
Takuro: "Do you know where is my mother? Yoshio!" 
"Yoshio" (Assistant): "Your mother is going shopping now. She 
30 will be back soon" . 

Takuro : " I see , thank you . " 
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( 8 ) Services 

The interactive operation support system 1 according 
to the preferred embodiment of the present invention may 
establish communication with the user by means of interaction 
5 on a speech base through the medium of the personified 
assistant . 

The input user speech is not always limited to commands 
for the system 1 and the equipment such as the video deck 
externally connected to the system. For instance, interaction 

10 in a conversation form is established. Some embodiments of 
the user-assistant interaction of this type are given in the 
following . 

( Embo dimen t 1 ) 
User: "How will be the weather tomorrow?" 

15 "Yoshio": "It will rain (He speaks in a sad tone.)." 
( Embodiment 2 ) 

User: "Do you think there is heavy traffic on Chuo Highway? 
Yoshio?" 

"Yoshio": "Well, so, so (He speaks in a cool tone.)". 
20 (Embodiment 3) 

User: "What time is it now? Yoshio" 

"Yoshio": " (He shows his wristwatch to the user in 

silence . ) " 
(Embodiment 4) 

25 User: "What time is it now in San Francisco? Yoshio" 

"Yoshio": " (He shows to the user his wristwatch on his 

glove patterned with Stars and Stripes in silence.)" 
(Embodiment 5) 
User: "Hi, Yoshio" 

30 "Yoshio": "What is the matter?" 

User: "Don't forget to call me at 6:00 tomorrow morning" 
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"Yoshio": "Why" 

User: "I have to join an important meeting". 
Yoshio I see." 



5 (Television is turned on at 6:00 in the next morning.) 

"Yoshio": "Good morning. You will join an important meeting, 
won ' t you? " 

(Embodiment 6) 
User: "Hi, Yoshio, I'm hungry." 

10 "Yoshio": "I suppose that the Pizza shop is still open. May 
I ask to place an order?" 
( 9 ) Remote control 

The interactive operation support system 1 according 
to the preferred embodiment of the present invention permits 

15 an exchange of data to or from the installed telephone or the 
mobile telephone via the modem 18 or the public telephone 
line (as already described in the above). Similarly, the 
system 1 is connected to a wide area network such as internet 
via the communication interface 18, and therefore , also permits 

20 exchange of data to or from a remote information terminal such 
as a personal computer. 

Thus, the personified assistant provided by the 
interactive operation support system 1 permits establishment 
of communication with the user by carrying on interaction on 

25 a speech base with the user through the remote terminal such 
as the personal computer and the mobile telephone. For 
instance, the system may accept an operation command for the 
external equipment such as the video deck through the mobile 
telephone. 

30 Provided that acceptance of the inputted user command 

from the remote place without restriction may be in danger 
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of infringing user's privacy or indoor security. In this 
connection, the interactive operation support system 1 is set 
to make a demand for input of certificate information for making 
sure of the legality of the user on the side of the remote 
5 information terminal such as a mobile telephone and a personal 
computer. A medium such as an ID card, for instance, may be 
available for input of the certificate information. On the 
other hand, input of a speech and data accepted to the system 
1 through a certifying device is interpreted as commands into 

10 execution (Refer to Fig. 32). 

An embodiment of scenario in acceptance of remote control 
from the user through the personified assistant to the 
interactive operation support system 1 according to the 
preferred embodiment of the present invention is given in the 

15 following (Refer to Figs. 33 to 35). 
( Embodiment ) 

User: "Is Yoshio there? (The user makes a mobile telephone 
call to user ' s home . ) " 
"Yoshio": "Yes. I am on the line." 
20 User: "Record a program starting from 8:00 today on Channel 
NHK. " 

"Yoshio": "A program starting from 8:00 on Channel NHK? " 
User: "Yes" 
"Yoshio": "OK" 

25 (10) Extraction of text information fr om the content of a 

program currently on the air 

The interactive operation support system 1 according 

to the preferred embodiment of the present invention includes 

a tuner 15 for channel selection, that is, tuning of the 
30 broadcast wave of a predetermined channel. The received 

broadcast wave is separated into video data portion and audio 
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data portion. The video data is outputted to the monitor 
25 through an image processing unit 16 for display on the screen , 
while the audio data is outputted to a speaker 21 through a 
speech synthesis unit 14 for production of sounds . 
5 Decoded video data sometimes contains text information 

such as superimposition annexed to the program content , in 
addition to the principal content of the program . For instance , 
information relating to the progress of scoring in a sports 
event relay program such as a baseball game, a soccer game 

10 or the like and time information is included. 

For instance, it is assumed that there is a demand for 
information relating to the progress of a game, when a sports 
program is currently on the air on a different channel. 

The interactive operation support system 1 according 

15 to the preferred embodiment of the present invention is provided 
with a plurality of channel selection functions of the tuner 
to receive and decode sports program as a program on a different 
channel by using a channel selection function which is opened 
while a program on a certain channel is being selected. Then, 

20 the decoded video data is measured and recognized for extraction 
of text information associated with the progress of scoring. 
The system 1 may also be set to allow the personified assistant 
to read up the extracted text information or to inform the 
user of the extracted text information in the form of 

25 superimposition or a sub window displayed on the content of 
the program currently projected on the screen. 

Fig. 36 is a flow chart showing the procedure of informing 
the user of the text information contained in a program on 
the different channel. A description will now be given of 

30 the processing of informing the user of the text information 
with reference to the flow chart of Fig. 36. 
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Firstly, a score display area is extracted from 
broadcasting video data, in Step S21. 

Subsequently, pattern matching is performed to extract 
score information from the score display area. Step S22. 
5 Pattern matching of the whole video data is also 

applicable to search the associated text information . However , 
as shown in Fig. 37, since the score display area is usually 
placed at a substantially same location, it is possible to 
find the displayed score easily and fast with the location 
10 of the score display area as a key. 

Subsequently, it is decided whether or not the extracted 
score information is changed from the previous result of 
extraction, in Step S23. 

When a change of score information occurs, the system 
15 informs the user about the change of score information, in 
Step S24. The change of score information may be informed 
by means of a speech of the personified assistant through the 
speech synthesis, for instance, or may be displayed on the 
screen by means of a sub-window, a 3D display, texture and 
20 2-D alpha-blending. 

On the other hand, when the score information remains 
unchanged, the processing is returned to Step S21 to repeatedly 
execute a similar processing to the above. Incidentally, the 
system may also be set to inform the user of the score information 
25 at a predetermined time interval , even if the score information 
remains unchanged. 

According to the similar method to the above, displayed 
time information may be extracted from the program content , 
in addition to the score information of the sports program. 
30 For instance, when the user is watching the television 

program containing displayed time information, or a plurality 
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of tuners are provided to monitor the program containing 
displayed time information on the different channel by the 
use of a spare tuner , time information may be measured according 
to an image measurement /recognition technique. 
5 For instance, when a plurality of external equipment 

is locally connected to the single interactive operation 
support system 1 , or remote-connected through a communication 
medium such as a home network, the operation of setting time 
information of all the equipment right is important whereas 

10 being complicated. However, no accurate synchronization of 
current time information with one another among the equipment 
brings about malfunctions. For instance, when time 
information on the tuner side is not correct in case of reserved 
recording of a program, the system may fail to record the 

15 program. 

On the other hand, according to the interactive operation 
support system 1 of the preferred embodiment of the present 
invention, automatic setting of current time information of 
the television monitor 25 or the other externally connected 

20 equipment maybe performed by measuring the current time through 
the image measurement /recognition processing, when the user 
is watching the television program containing displayed time 
information, or the tuner is provided with a plurality of 
channel select functions to decode a program containing 

25 displayed time information on the different channel by the 
use of the spare tuner. 

Fig. 38 is a flow chart showing the procedure of setting 
time information based on the time information displayed on 
the broadcasting program content. A description will now be 

30 given of the procedure of setting the time information with 
reference to the flow chart of Fig. 38. 
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Firstly, a time display area is extracted from 
broadcasting video data, in Step S31. 

Subsequently, pattern matching is performed to extract 
time information from the time display area, in Step S32 . 
5 Pattern matching of the whole video data is also 

applicable to search the associated text information . However , 
as shown in Fig. 39, the time display area is usually placed 
at the substantially same location in actuality, it is possible 
to search the displayed time information at high speed with 
10 the location of the time display area as a key. 

Subsequently, it is decided whether or not extracted 
time information is changed from the previous result of 
extraction, in Step S33. 

When change of time information occurs, the system sets 
15 the extracted time information as the current time. Step S34. 
The other connected external equipment is also set to display 
the extracted time information, in Step S35. 

On the other hand, when the time data remains unchanged, 
the processing is returned to Step S31 to repeatedly execute 
20 the similar processing to the above. 

As has been described in the foregoing, the present 
invention makes it possible to provide the system and method 
for supporting operations for input of user commands to the 
household electric equipment such as the television 
25 set/monitor and the information equipment. 

The present invention further makes it possible to 
provide the system and method for supporting interactive 
operations, permitting input of user commands to the equipment 
interactively . 

30 The present invention still further makes it possible 

to provide a system and method for supporting interactive 
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operations, permitting input of user commands to equipment 
or apparatuses in a nearly naturally human form through the 
personified assistant. 

The present invention yet further makes it possible to 
5 provide a system and method for supporting interactive 
operations, permitting input of user commands by means of the 
interaction with the personified assistant on the speech input 
base. 

The present invention yet further makes it possible to 
10 provide the system and method for supporting interactive 
operations, permitting feedback of the progress conditions 
of operations specified by the user commands inputted by means 
of the interaction with the assistant on the speech input base 
to the user. 

15 While the present invention has been described in detail 

with reference to the specific preferred embodiments, it is 
to be understood that modifications and variations are apparent 
to those skilled in the art without departing from the scope 
and spirits of the present invention. 

20 While the description in the present specification has 

been given based on the preferred embodiments, in which the 
interactive operation support system according to the present 
invention is applied to television operations, it is to be 
understood that the scope of application of the present 

25 invention is not limited to the above embodiments . The present 
invention may also have effects on the same kind of household 
electric equipment and information equipment having the 
function of generating and displaying the personified 
assistant, that of inputting, recognizing and synthesizing 

30 speech and that of carrying on a conversation with the user 
on a speech base . 
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In other words , the present invention is illustrative 
in its preferred form and not restrictive. 
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